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Increasing student retention and persistence - in particular classes or in their major area of 
study - is a challenge for universities. Students’ academic and social integration into an institution 
seems to be vital for student retention, yet, research on the effect of interpersonal interactions is 
rare. Social network analysis is an approach that can be used to identify patterns of interaction 
that contribute to integration into the university. We analyze how students position within a social 
network in a Modeling Instruction (MI) course that strongly emphasizes interactive learning impacts 
their persistence in taking a subsequent MI course. We find that students with higher centrality at 
the end of the first semester of MI are more likely to enroll in a second semester of MI. While the 
correlation with increased persistence is an ongoing study, these findings suggest that student social 
integration influences persistence. 
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I. INTRODUCTION 

The publication of Vincent Tinto’s student integration 
model marks the start of the dialogue on undergraduate 
retention. In his model, Tinto introduced the notion of 
“external” (e.g., families, neighborhoods, work settings) 
and “internal” (e.g., learning groups within a classroom, 
residence halls) communities that affect student integra¬ 
tion into the social and academic environment of the uni¬ 
versity mm- Increasing the retention of students in 
a particular course and their persistence in continuing 
through their major area of study and finishing their de¬ 
gree is a big challenge for universities. Based on the work 
of Tinto and others increasing social and academic inte¬ 
grations is one of prime targets to increase persistence. 

Social network analysis (SNA) is a well-suited ap¬ 
proach to study student academic and social integration. 
SNA can be used to identify patterns of interaction that 
contribute to integration into the university. It provides a 
methodology to assess the effect of interpersonal interac¬ 
tions on students’ persistence. While students’ academic 
and social integration into an institution seems to be 
essential to student retention, effective implementation 
of measures to prevent losing students is sparse. Devel¬ 
oping network methodologies for studying retention and 
persistence among university students is a newly-forming 
research area ElE]. 

We use SNA techniques to address questions of reten¬ 
tion and persistence of students at Florida International 
University (FIU) - a large, Hispanic Serving Institution. 
In particular, we analyze how student’s position within 
a social network in an introductory mechanics Modeling 
Instruction (M-MI) course impacts their persistence in 
taking a subsequent electricity and magnetism MI (EM- 
MI) course. Modeling Instruction is a guided-inquiry 
interactive-engagement method of teaching that orga¬ 
nizes instruction around building, testing and applying 


a handful of scientific models that represent the content 
core of physics. Instead of relying on lectures and text¬ 
books, the MI program emphasizes active student con¬ 
struction of conceptual and mathematical models in an 
strongly interactive learning community. It is therefore 
an important case for studying the effects of building stu¬ 
dent communities on promoting persistence. 

II. METHODOLOGY 

To collect social network data we have developed a 
pencil and paper survey that was administered in the 
introductory mechanics MI course in the Fall 2014. Every 
four weeks throughout the semester students were asked 
the following question: 

Name the individual(s) (first and last name) 
you had a meaningful classroom interaction* 
with today, even if you were not the main 
person speaking or contributing. (You may 
include names of students outside of the 
group you usually work with) 

*A classroom interaction includes but is not lim¬ 
ited to people you worked with to solve physics 
problems and people that you watched or listened 
to while solving physics problems. 

In the Spring semester, we presented students in the elec¬ 
tricity and magnetism MI course with a modified version 
of the survey, containing a roster with names of all stu¬ 
dents enrolled in the course and a weighted version of the 
question about interactions, as shown in Fig. [^. 

The SNA data were collected over one semester of 
M-MI course (Fall 2014) and one semester of EM-MI 
course (Spring 2015). Both sections were taught by the 
same instructor accompanied by teaching assistants (the 
same two TAs in both semesters) and learning assistants 
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Please choose from the list below people from your 
physics class that you had a meaningful interaction with 
in class this week, even if you were not the main person 
speaking or contributing. You may include names of 
students outside of the group you usually work with. 
You don’t have to fill in all columns. You may use the 
name of a student or their corresponding number. 


I had a meaningful interaction 


with these people this week in 

class... 

...one time. 

...more than on time 
but NOT every day. 

...every day. 





Roster containing names of students enrolled in EM-MI 

FIG. 1: An excerpt of the SNA survey presented to students 
in a Spring 2015 semester. 


(three LAs in a Fall semester and two LAs in a Spring 
semester, one overlapping person). In each semester we 
collected SNA data five times throughout the duration of 
the course. The total number of students enrolled in the 
M-MI was 73 and it was 74 for the electricity and mag¬ 
netism ML Both MI courses were taken by 40 students 
and a second semester of physics in a more traditional 
arrangement was taken by 10 students from M-MI. The 
response rates on all surveys but one were over 75% and 
therefore we disregarded the survey with an unusually 
low return (43%) from the analysis (the last survey in 
the Fall semester). In our analysis we are using the last 
valid survey from the Fall semester, that is SNA4. 

SNA uses the notion of nodes (in our case students en¬ 
rolled in M-MI) and edges (the interactions identified by 
students in the survey) to represent the network. From a 
graph theoretic perspective, the relative importance of a 
node within a graph is determined using centrality mea¬ 
sures. To answer a question: “Who are the most im¬ 
portant nodes in a network?” one has to determine how 
central each node is [7). Evaluating the relative position 
of nodes in the network helps to understand the network 
and their participants. 

There are various measures of centrality that quantify 
the importance of nodes and edges. In this paper we will 
focus on the four most commonly used measures: degree, 
eigenvector, betweenness and closeness (see Fig. |^. 

The degree centrality of a node i, Cnii), is the number 
of edges connected to it. 


Cnii) — ^ ^ Xjj — ^ 

j=i j=i 

where Xij is the value of the edge from node i to node j 
(the value being either 1 if the tie is present or 0 other¬ 
wise) and n is the number of nodes in a network. In the 
case of a directed network, that is a network that takes 
into an account the origin of an edge, one can define two 
additional measures of degree centrality: indegree (the 
number of ties directed to the node, can be interpreted 


as popularity) and outdegree (the number of ties that the 
node directs to others, can be interpreted as sociability): 

n n 

Cinnii^ — ^ ^ ^ji CoutDii^ — ^ ^ ^ij • 

i=i i=i 

The eigenvector centrality is the sum of a node’s con¬ 
nections to other nodes weighted by their degrees and 
it measures the influence^of a node in a network. It is 
given by an eigenvector, Ce^ of an adjacency matrix. A, 
corresponding to the greatest eigenvalue, Xmax^ that is 

A Ce — XmaxCE ■ 

A is a matrix related to a graph by aij = 1 if a node i is 
connected to a node j by an edge and 0 if it is not and 
(?£; is a vector containing the centralities of all nodes in 
the network. 

The (in/out)degree and eigenvector centralities are 
very intuitive and relatively easy to calculate. However, 
they are all local measures and the network outside of 
the immediate vicinity of a node - i.e., outside the “ego 
network” - has no influence on them. 

The betweenness quantifies the number of times a node 
acts as a bridge along the shortest path linking two other 
nodes. It captures the importance of a position within 
a whole network and can be interpreted as a measure of 
how much control over the flow of information a node 
has. It’s given by 


CB{k)= 


aij 


where (Jij{k) is the number of shortest paths linking node 
i to node j that pass through node k, aij the number of 
shortest paths linking node i to node j. 

The closeness is the inverse of the sum of distances 
from all other nodes. It emphasizes a node’s indepen¬ 
dence - a node that is close to many other nodes can 
easily reach others without having to rely much on inter¬ 
mediaries, thus gaining an easy access to information in 



FIG. 2: In each of the following networks, X has higher cen¬ 
trality than Y according to (a) indegree, (b) outdegree, (c) 
eigenvector, (d) betweenness and (e) closeness. 
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the network. It is a measure of how near an individual is 
to all other nodes in a network. Closeness is defined as 

Cc{i)=[j2dio]~" 

j = l 

where dij is the shortest distance connecting node i to 
node j. The network from survey SNA4 using the close¬ 
ness as a measure of importance is visualized in Fig. 

To investigate correlations between the students’ cen¬ 
tralities, gender, ethnicity, major of study, final grade 
and their persistence in MI, a logistic regression model 
(LRM) was used. To avoid confounding factors we per¬ 
formed multivariate logistics regression. All variables sig¬ 
nificant for the univariate analysis were incorporated into 
the multivariate model. The comparison of goodness of 
fit of multivariate and univariate models was performed 
using the likelihood ratio test, with the null hypothesis 
stating that the univariate model is a better predictor of 
the persistence. The variance inflation factor (VIF) was 
calculated to estimate how much the variance of a co¬ 
efficient was inflated because of linear dependence with 
other predictors. Finally, the mutual information ap¬ 
proach was used to find the most significant split into 
the predicting/non-predicting categories for each of the 
centrality measures and the chi-square test was used to 
verify significance of this split [8]. For the statistical anal¬ 
ysis we used the R program [9]. We considered results 
with p < 0.05 as significant. 

III. FINDINGS 

We analyze how a student’s position within a social 
network in a M-MI course, which strongly emphasizes 
interactive learning, impacts their persistence in taking 
a subsequent EM-MI course. We consider two cases: (1) 
students persistence in physics, i.e., taking any form of 
the second semester physics, and (2) students persistence 
in MI. We are interested in interactions between students 
and therefore we excluded from the network all instruc¬ 
tional staff. Using the Wilcoxon rank-sum test we found 
no evidence for a statistically significant differences be¬ 
tween the two population medians (i.e., with and with¬ 
out instructors) for all centralities but closeness. Thus, 
for this last measure we considered two cases - without 
(closeness) and with (closenessINS) instructors. 

As shown in Table we found statistically significant 
positive correlations with a degree, indegree and closeness 
for persistence in physics. For MI we found statistically 
significant correlations only for measures considering the 
entire social network, that is betweenness and closeness, 
and no statistically significant correlations for measures 
aimed at the students ego network. 

To determine whether our univariate models can be 
improved we considered nested multivariate models for 
all the statistically significant centrality measures, with a 
student’s gender, ethnicity, academic plan (declared ma¬ 
jor) and a final grade considered as additional predictors 



FIG. 3: (Color online) The graph representation of the social 
network resulted from the SNA4 data. The nodes represent 
students enrolled in the M-MI in Fall 2014 and ties represent 
the directed interactions. The size of each node corresponds 
to student’s closeness centrality. 


of the persistence. We found that only for the grade 
made a statistically significant difference in the model 
fits. Table |TI| summarizes the results of the logistic re¬ 
gression for both in physics and in MI cases. However, 
when we compared the fit of the multivariate models to 
the fit of the models reduced to a grade as a sole predict¬ 
ing variable, we found significantly better fit only for the 
full betweenness model (x^(l) = 7.89, p = 0.005). The 
variance inflation factors indicates the lack of collinearity 
among betweenness and grade {VIF = 1.03). 

Finally, to optimize the correlation and to determine 
the predictability threshold for centralities we used the 
mutual information. Table Hill shows the threshold values 


TABLE I: Summary of the univariate logistic regression for 
persistence as predicted by various centrality measures. We 
considered only networks without instructional staff for all 
measures except closeness, for which we analyzed two cases. 
ClosenessINS indicates a network with instructors. Signifi¬ 
cant p-values are marked with an asterisk. 


Centrality 

In physics 
Estimate p-value 

In MI 

Estimate p-value 

Total degree 

0.20 

0.016* 

0.12 

0.106 

In degree 

0.37 

0.018* 

0.22 

0.135 

Out degree 

0.27 

0.061 

0.16 

0.188 

Eigenvector 

0.76 

0.516 

0.38 

0.697 

Betweenness 

-20.40 

0.062 

-26.22 

0.043* 

Closeness 

113.36 

0.037* 

94.82 

0.032* 

ClosenessINS 

119.63 

0.035* 

100.09 

0.030* 
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TABLE II: Summary of the likelihood ratio test performed 
for the multivariate logistics regression with a student’s final 
grade considered as additional predictors of the persistence 
when compared to the simple models. 


Model ^ 

^ Centrality + Grade 

df 


p-value 

o 

Total degree 

2 

25.75 

< 10"® 


In degree 

1 

25.83 

< 10"® 

Qh 

Closeness 

1 

26.63 

< 10“® 


ClosenessINS 

1 

26.58 

< 10“® 


Betweenness 

1 

15.28 

< lO-^ 


Closeness 

1 

11.17 

< 10"® 


ClosenessINS 

1 

11.14 

< 10“® 


for each centrality measure and its significance level. 

IV. DISCUSSION 

We find that students with higher certain centrality 
measures at the end of the first semester of MI are in 
fact more likely to enroll in a second semester of physics. 
For the MI sequence, we found that students with low 
closeness seem to be more likely to enroll in a second 
semester of MI while (in/out/total)degree has no affect 
on their decision. On the other hand, students with high 
betweenness score tend to either switch to traditional cur¬ 
riculum or to leave physics altogether. Moreover, higher 
grades strengthen this negative correlation, that is stu¬ 
dents with higher final grades and high betweenness are 
the most likely to leave MI but remain in physics. 

To explain this discrepancy one needs to understand 
what these two measures mean in practice. Closeness can 
be thought of as strong embeddedness within the entire 
network. Students with low closeness scores are close to 
all the other students in the network and thus they have 
an easy access to information from many sources. They 
are also - by sheer nature of this measure - connected 
to many students. This can help them appreciate all the 
benefits of having a strong network of connection within 


TABLE III: The threshold value (0) for each centrality mea¬ 
sure as determined by maximization of the mutual informa¬ 
tion and its significance level measured by the chi-square test. 



Centrality 

0 

df 

x^ 

p-value 

o 

Total degree 

1 

1 

11.05 

< 10“® 

Tn 

In degree 

1 

1 

7.37 

0.007 

Qh 

Closeness 

0.013 

1 

8.62 

0.003 


ClosenessINS 

0.012 

1 

11.61 

< 10“® 

§ 

Closeness 

0.022 

1 

4.53 

0.033 


ClosenessINS 

0.021 

1 

4.53 

0.033 


a classroom. Betweenness, on the other hand, depends 
mainly on the position within the network. In practice, 
in order to have high betweenness it suffices to be con¬ 
necting clusters otherwise separate. Thus, students with 
high betweenness score are not necessarily connected to 
many other students. 

For physics in general we found a statistically signif¬ 
icant positive correlation between (in)degree and close¬ 
ness. However, due to a small sample of students who 
took a non-MI physics this finding requires further study. 

It should be kept in mind that a centrality which is 
appropriate for one category will often “get it wrong” 
when applied to a different category. More importantly, 
while centralities identify the most important vertices in 
a given network, this ranking cannot be generalized to 
the remaining vertices with lower scores - centrality does 
not indicate the relative importance of all vertices. 

While the correlation with increased persistence is an 
ongoing study, these findings suggest that student social 
integration influences persistence. 


Acknowledgments 

We would like to thank Geoff Potvin for facilitating 
data collection, Eric Williams for helpful discussions and 
Anita Dqbrowska for feedback and recommendations on 
statistical analyses. Supported by NSF PHY 1344247. 


[1] V. Tinto, Rev. Educ. Res. 45, 1 (1975). 

[2] V. Tinto, J. High. Educ. 68, 6 (1997). 

[3] S. Wasserman and K. Faust, Social Network Analysis: 
Methods and Applications, (Cambridge Univ. Press, 1994). 

[4] J. Scott and P. J. Carrington, eds. The SAGE Handbook of 
Social Network Analysis, (SAGE Publications Ltd, 2011). 

[5] E. Brewe L. Kramer and V. Sawtelle, , Phys. Rev. ST 
PER 8, 010101 (2012). 

[6] J. Forsman, R. Moll and C. Linder, Phys. Rev. ST. PER 
10 , 020122 (2014). 


[7] In this contexts, important can mean the ability to transfer 
information across the network or it can be understood as 
involvement in the cohesiveness of the network. 

[8] The mutual information is an information theoretic mea¬ 
sure of dependency between two random variables. 

[9] R Core Team (2015). R: A language and environment for 
statistical computing. R Foundation for Statistical Com¬ 
puting, Vienna, Austria. 



